Extensive Simulations for Longest Common Subsequences Finite Size Scaling, a Cavity Solution, and Connguration Space Properties

نویسنده

  • J. Boutet
چکیده

Given two strings X and Y of N and M characters respectively, the Longest Common Sub-sequence LCS Problem asks for the longest sequence of non-contiguous matches between X and Y. Let LN be the length of a LCS of two random strings of size N. Using extensive Monte Carlo simulations for this problem, we nd a nite size scaling law of the form ELN=N = S + AS=ln N p N + :::, where S and AS are constants depending on S, the alphabet size. We provide precise estimates of S for 2 S 15. We also study the related Bernoulli Matching model where the diierent e n tries of the strings" are matched independently with probability 1 =S. Let L B N M be the length of a longest sequence of matches in this case, for a given instance of size N M. On the basis of a cavity-like analysis we nd B S r = 2 p rS, r , 1=S , 1, where B S r is the limit of EL B N M =N as N ! 1 , the ratio r = M= N being xed. This formula agrees very well with our numerical computations of EL B N M. It provides also a v ery good approximation for Sr, the corresponding function of the random string model, the approximation getting better as S increases. We nally study the ground state" properties of this problem. We nd that the numberNLCS of solutions typically grows exponentially with N. In other words, this system does not satisfy Nernst's principle". This is also reeected at the level of the overlap between two LCSs chosen at random, which is found to be self averaging and to aproach a deenite value qS 1 a s N ! 1 .

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extensive Simulations for Longest Common Subsequences

Given two strings X and Y of N and M characters respectively, the Longest Common Sub-sequence LCS Problem asks for the longest sequence of non-contiguous matches between X and Y. Let LN be the length of a LCS of two random strings of size N. Using extensive Monte Carlo simulations for this problem, we nd a nite size scaling law of the form ELN=N = S + AS=ln N p N + :::, where S and AS are const...

متن کامل

Extensive Simulations for Longest Common SubsequencesFinite

Given two strings X and Y of N and M characters respectively, the Longest Common Sub-sequence (LCS) Problem asks for the longest sequence of (non-contiguous) matches between X and Y. Let LN be the length of a LCS of two random strings of size N. Using extensive Monte Carlo simulations for this problem, we nd a nite size scaling law of the form E(LN)=N = S + AS=(ln N p N) + :::, where S and AS a...

متن کامل

FACC: A Novel Finite Automaton Based on Cloud Computing for the Multiple Longest Common Subsequences Search

Searching for the multiple longest common subsequences MLCS has significant applications in the areas of bioinformatics, information processing, and data mining, and so forth, Although a few parallel MLCS algorithms have been proposed, the efficiency and effectiveness of the algorithms are not satisfactory with the increasing complexity and size of biologic data. To overcome the shortcomings of...

متن کامل

Increasing Subsequences in Nonuniform Random Permutations

Connections between longest increasing subsequences in random permutations and eigenvalues of random matrices with complex entries have been intensely studied. This note applies properties of random elements of the finite general linear group to obtain results about the longest increasing and decreasing subsequences in non-uniform random permutations.

متن کامل

Fast Linear-Space Computations of Longest Common Subsequences

Space saving techniques in computations of a longest common subsequence (LCS) of two strings are crucial in many applications, notably, in molecular sequence comparisons. For about ten years, however, the only linear-space LCS algorithm known required time quadratic in the length of the input. for all inputs. This paper reviews linear-space LCS computations in connection with two classical para...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998